Prosody change and response timing analysis in spontaneously spoken dialogs and their modeling in a spoken dialog system
نویسندگان
چکیده
If a dialog system were to respond to a user as naturally as a human, interaction would be smoother. Imitating the human prosodic behavior of utterances is important in computer-human natural conversations. In this paper, to develop a cooperative/friendly spoken dialog system, we analyzed the correlations between F0 synchrony tendency or overlap frequency and subjective measures: “liveliness,” “familiarity,” and “informality” in human-human dialogs. We also modeled the properties of these features and implemented the model on our dialog system that generated the response timing of aizuchi (back-channel), turn-taking based on a decision tree in real time, and dynamical F0 changes to realize chat-like conversations.
منابع مشابه
Responding to user emotional state by adding emotional coloring to utterances
When people speak to each other, they share a rich set of nonverbal behaviors such as varying prosody in voice. These behaviors, sometimes interpreted as demonstrations of emotions, call for appropriate responses, but today’s spoken dialog systems lack the ability to do so. We collected a corpus of persuasive dialogs, specifically conversations about graduate school between a staff member and s...
متن کاملAnalysis of relationship between impression of human-to-human conversations and prosodic change and its modeling
If a dialog system could respond to a user as naturally as a human, the interaction would be smoother. Imitating human prosodic characteristics of utterances is important in computerto-human natural interaction. To develop a cooperative/friendly spoken dialog system, we analyzed the correlation between the fundamental frequency’s synchrony tendency, or overlap frequency, and subjective measures...
متن کاملAutomatic user-adaptive speaking rate selection for information delivery
Today there are many services which provide information over the phone using a prerecorded or synthesized voice. These voices are invariant in speed. Humans giving information over the telephone, however, tend to adapt the speed of their presentation to suit the needs of the listener. This paper presents a preliminary model of this adaptation. In a corpus of simulated directory assistance dialo...
متن کاملMining Spoken Dialogue Corpora for System Evaluation and Modelin
We are interested in the problem of modeling and evaluating spoken language systems in the context of human-machine dialogs. Spoken dialog corpora allow for a multidimensional analysis of speech recognition and language understanding models of dialog systems. Therefore language models can be directly trained based either on the dialog history or its equivalence class (or cluster). In this paper...
متن کاملDetection of Task-Incomplete Dialogs Based on Utterance-and-Behavior Tag N-Gram for Spoken Dialog Systems
We propose a method of detecting “task incomplete” dialogs in spoken dialog systems using N-gram-based dialog models. We used a database created during a field test in which inexperienced users used a client-server music retrieval system with a spoken dialog interface on their own PCs. In this study, the dialog for a music retrieval task consisted of a sequence of user and system tags that rela...
متن کامل